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Instructions 

Read each question carefully and write your answer legibly on the examination paper. No other 
paper will be accepted. You may use the backs of pages for rough work but all final answers must 
be in the spaces provided. The marks for each question are as indicated. Allocate your time 
accordingly. 

Ensure that your name AND student number are clearly written on the examination paper and that 
your name is on every page. 

Note: a reference table of MIPS instructions is provided at the end of the examination paper. 


Question 

Marks 

1 (10 marks) 


2 (14 marks) 


3 (16 marks) 


4 (20 marks) 


5 (21 marks) 


6 (19 marks) 


Total 



Name:_ 

Student Number: 




Name: 


Page 2 


Student Number: 


1. General (10 marks in total - 1 mark for each part) Give the technical term that best fits each of 
the following descriptions or definitions. 

(a) With the IEEE 754 floating point standard, the result of dividing 0 by 0. 


(b) A cache storing recently used mappings between virtual and physical addresses. 


(c) In a pipelined datapath, one of these is used to store the results of each stage of instruction 
processing for use in the next stage. 


(d) A memory element storing a single bit, the value of which is changed only on a clock edge. 


(e) An organization that has developed a variety of widely used benchmark suites, including ones 
for assessing integer and floating-point CPU performance. 


(f) An approach to the control component of a processor in which the control signal settings 
required for each step are stored in a ROM in an instruction-like format rather than computed 
with a combinational circuit. 


(g) A statement that tells the assembler how to translate a program but does not produce machine 
language instructions; in SPIM always begins with a period. 


(h) An approach to speeding up integer addition in which carries are determined before all of the 
preceding sum bits, through computation of generate and propagate values. 


(i) An advanced pipelining technique that enables the processor to execute more than one 
instruction per clock cycle. 


(j) In processor cache management, a scheme that handles writes by initially updating only the 
cache; main memory is not updated until the updated block needs to be replaced from the 
cache to make room for a new block. 
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2. Computer Performance (14 marks in total) 

(a) (2 marks ) One way to summarize results from running a set of benchmarks is to take the sum 
of the benchmark execution times. Give two alternative methods of summarizing benchmark 
results, and state what advantages each may have. 


(b) (2 marks) Consider a system and application for which 25% of all cycles are spent on cache 
misses. Suppose that through changes to the memory and cache architecture we are able to 
decrease the number of cycles spent on cache misses by a factor of N. Give the ratio of the 
old CPU execution time to the new CPU execution time, as a function of N. 


(c) (4 marks) Consider a system with two classes of instructions A and B. Class A instructions 
have a CPI of 2, while class B instructions have a CPI of 1. The clock rate is 2 GHz. A 
particular application executes 100 million class A instructions, and 300 million class B 
instructions. 

(i) What is the length of a clock cycle in nanoseconds? 


(ii) What is the CPI? 


(iii) What is the CPU execution time in seconds? 


(iv) 


What is the MIPS rating? 
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(d) (4 marks ) Suppose that a new, complex machine language instruction is implemented for an 
operation that previously had to be done with a sequence of simpler instructions. For each of 
the following metrics, if it changed following the addition of the new instruction would you 
expect it to have increased, or decreased, and why: 

(i) clock cycle time 

(ii) CPI 

(iii) instruction count 

(iv) MIPS rating 


(e) (2 marks ) Consider a system with two classes of instructions A and B. Class A instructions 
have a CPI of 4, while class B instructions have a CPI of 2. The clock cycle time is 0.5 
nanoseconds. If a particular application executes 800 million instructions and has a CPU 
execution time of 1 second, what fraction of the instructions executed must be of class A? 


3. Arithmetic (16 marks in total ) 

(a) (2 marks ) Suppose that A, B, and C are three floating point variables. Can (A+B)+C ever not 
yield the same result as A+(B+C), when evaluated on a computer? Explain your answer. 


(b) (2 marks) Suppose that A, B, and C are three unsigned integer variables, operated on using 
“addu”. Can (A+B)+C ever not yield the same result as A+(B+C)? Explain your answer. 
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(c) (2 marks) Give a diagram showing how a ripple carry adder capable of adding two 4 bit 
binary numbers asaiaiao and b ibobibo can be constructed from 1 bit full adders. Your diagram 
should clearly show the inputs and outputs of each 1 bit full adder and the connections 
between them. 


(d) (6 marks ) Perform the following conversions: 

(i) The 5 bit two’s complement number 11010 to decimal. 

(ii) ABCDi6 to binary. 

(iii) 17 13 to decimal. 

(iv) -3 io to five bit biased notation with a bias of 15. 

(v) 0.2io to binary. 


(vi) 


The 5 bit sign-magnitude number 10001 to decimal. 
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(e) (4 marks) Give a truth table for a 2 data input, 1 selector input multiplexor, and a 
corresponding minimal logic equation in sum-of-products form. 


4. Machine and Assembly Language (20 marks in total) 

(a) (5 marks) In MIPS assembly language, one can refer to a memory location using a symbolic 
name, as in “lw $a0, A”. It is not possible to use symbolic names in machine code, however. 
Outline how the assembler creates machine code from “lw $aO,A”, and how this code may be 
modified by the linker. 


(b) (2 marks) The MIPS instruction set architecture has 32 (integer) registers. Suppose that using 
a new technology for implementation of the register file, 128 registers could be provided 
rather than just 32, without having to increase the length of a clock cycle. What problems, if 
any, would arise if the instruction set architecture was modified to have 128 rather than 32 
registers? 
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(c) (2 marks ) One way to “link” separately developed pieces of code is to combine the source 
files manually using a text editor. When else can “linking” occur? 


(d) (2 marks ) MIPS uses a load-store or register-register architectural style. List three other 
styles of instruction set architecture. 


(e) (4 marks) Translate the following code into an equivalent sequence of MIPS assembly 
language instructions, assuming that register $s0 corresponds to the integer variable “i” and 
that register $sl holds the base address of the integer array A (with elements indexed starting 
from 0). Clearly state the purpose of any other registers that you may decide to use. 

if (A[i] > 0) 

A[i] = A[i] + 1; 

else 

A[i] = A[i] - 1; 

i = i+1; 



Name: 


Page 8 


Student Number: 


(f) (6 marks ) Consider a linked list data structure in which each node is implemented with two 
consecutive words of memory. The first word of each node contains an integer value. The 
second word contains the memory address of (the first word of) the next node in the list. A 
memory address value of zero indicates the end of the list. Write a MIPS procedure reverse 
that takes as its argument the address of the first node in the list, and reverses the list (i.e., the 
first node becomes the last node, the second node becomes the second last node, and so on). 
Your procedure should return the address of the new first node. You must use the standard 
procedure calling conventions. (You do NOT need to write a main program.) 


5. Datapath and Control (21 marks in total ) 


(a) (4 marks ) Outline the main approaches to dealing with branch hazards. 
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(b) (2 marks ) Give a high-level outline 
implementation of the MIPS architecture. 






(c) (2 marks ) Consider the single-cycle, multicycle, and pipelined MIPS implementations 
discussed in class. 

(i) With which one of these implementations would you expect to have the highest 
CPI? 


(ii) With which one would you expect to be able to achieve the highest clock rate? 

(iii) With which one would you expect to have the lowest clock rate? 


(d) (10 marks in total - 1 mark for each answer ) Assume that the operation times for the major 
functional units in a MIPS datapath implementation are as follows (operation times for other 
units are assumed negligible): 

Memory units (read or write): 1 nanosecond 

ALU and adders: 1 nanosecond 

Register file (read or write): 0.5 nanoseconds 

(i) To build a single-cycle datapath on which a subset of the R-format instructions (add, 
sub, and, or, sit) could be executed, and no other types of instructions, what would be 
the minimum clock cycle time? 

(ii) To build a single-cycle datapath on which the store instruction sw could be executed, 
and no other types of instructions, what would be the minimum clock cycle time? 

(iii) What would be the minimum clock cycle time for a single-cycle datapath on which the 
load instruction Iw could be executed? 
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(iv) What would be the minimum clock cycle time for a multicycle datapath on which the 
R-format, sw, and Iw instructions could be executed? How many clock cycles would 
be required for each of the three types of instructions on a multicycle datapath that 
achieved this minimum? 

Clock cycle time:_ns 

R-format instruction:_clock cycles 

sw instruction:_clock cycles 

Iw instruction:_clock cycles 

(v) What would be the minimum clock cycle time for a pipelined datapath on which the 
R-format, sw, and Iw instructions could be executed? What is the total execution time 
for a single instruction? What is the total execution time for 10 instructions (assuming 
no data hazards)? 

Clock cycle time:__ns 

Execution time for a single instruction:_ns 

Execution time for 10 instructions:_ns 


(e) (2 marks) In general, how might compilers (or programmers, through hand-tuning) structure 
code so as to achieve improved performance on a pipelined processor? 


6. Cache and Virtual Memory (19 marks in total ) 

(a) (2 marks ) Outline how use of virtual memory can provide protection among multiple 
concurrently executing applications. 
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(b) (3 marks) 
hierarchy. 


Give three examples of how locality is exploited in the management of the memory 
For each example, state the type of locality being exploited. 


(c) (6 marks in total) Consider a cache with 8K bytes of data storage. 

(i) (1 mark ) Suppose the cache is 4-way set associative and that the block size is 1 

word. How many sets does the cache have? 


(ii) (2 marks) With the cache organization of part (i), which set would be checked on a 
reference to the memory (byte) address 3200 10 ? 


(iii) (1 mark) Suppose now that the cache is direct-mapped, and that the block size is 8 
words. How many positions does the cache have? 


(iv) (2 marks) For the cache organization of part (iii), give the memory address of a 
byte that is part of a different memory block than the byte with memory address 
9600 10 , and yet whose block would be placed in the same cache position. 


(d) (2 marks) Consider a computer system in which a physical page frame number is 18 bits, a 
virtual page number is 48 bits, and a virtual address is 64 bits. What is the maximum amount 
of physical memory (in Gbytes) that this system could have? 
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(e) (4 marks ) Consider the following portion of a page table from a system with 2K byte pages. 
All values are given in decimal. 


virtual page number 

physical page frame number 

6 

62 

1 

0 

2 

3 

3 

1 

4 

1620 

5 

5 


(i) Which virtual page contains the word with virtual (byte) address 8000 10 ? 


(ii) What is the page offset for this word? 


(iii) In which physical page frame is it contained? 


(iv) What is the word’s physical memory address? 


(f) (2 marks ) In general, how might compilers (or programmers, through hand-tuning) structure 
code so as to achieve improved performance on a processor with caches and virtual memory? 


(The End) 




